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Abstract 

A new method for estimating Sobol' indices is proposed. The new 
method makes use of 3 independent input vectors rather than the usual 
2. It attains much greater accuracy on problems where the target Sobol' 
index is small, even outperforming some oracles which adjust using the 
true but unknown mean of the function. When the target Sobol' index is 
quite large, the oracles do better than the new method. 

1 Introduction 

Let / be a deterministic function on [0, l] d for d ^ 1. Sobol' sensitivity indices, 
derived from a functional ANOVA, are used to measure the importance of sub- 
sets of input variables. There are two main types of index, but one of them is 
especially hard to estimate in cases where that index is small. 

The problematic index can be represented as a covariance between outcomes 
of / evaluated at two random input points, that share some but not all of 
their components. A natural estimator then is a sample covariance b ased on 
pairs of random d- vectors of this type. Sobol' and Mvshetskava ( 2007h report 



a numerical experiment where enormous efficiency differences obtain depending 
on how one estimates that covariance. The best gains arise from applying some 
centering strategies to those pairs of function evaluations. 

This article introduces a new estimator for the Sobol' index, based on three 
input vectors, not two. The new estimator makes perhaps surprising use of 
randomly generated centers. The random centering adds to the cost of every 
simulation run and might be thought to add noise. But in many examples 
that noise must be strongly negatively correlated with the quantity it adjusts 
because (in those examples) the random centering greatly increases efficiency. 
The new estimate is not always most efficient. In particular when the index 
to be estimated is large the new estimate is seen to perform worse than some 
oracles that one could approximate numerically. 

The motivation behind Sobol' indices, is well explained in the text bv lSaltelli et al 



(2008). These indices have been applied to problems in industry, science and 
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public health. For a recent mathematical account of Sobol' indices, see lOwen 
( 20121) . 

The outline of this article is as follows. Section [5] introduces Sobol' indices 
and our notation. Section [3] presents the original estimator of the Sobol' indices 
and the four improved estimators we consider here. Section 0] considers some 
numerical examples. For small Sobol' indices, the newly proposed estimator is 
best, beating two oracles. For very large indices, the best performance comes 
from an oracle that uses the true function mean twice. Section [5] presents some 
theoretical support for the new estimator. It generalizes the estimator to a wider 
class of methods and shows that the proposed estimator minimizes a proxy for 
the variance, when one considers functions of product form. 



2 Background 

For d ^ 1, let / € L 2 [0, l] d . Then / can be written in an ANOVA decomposition 
as a sum of a constant fi — J, Q 1 j d f(x) dx and 2 d — 1 mutually orthogonal 
ANOVA effects, one for each nonempty subset of T> = {1,2, ... ,d}. The effect 
for non-empty subset u C T> has variance a 2 , while a% = 0. A larger a 2 means 
a more important interaction among those variables, but Sobol' indices account 
for the fact that the importance of a set of variables also depends on other 
interactions in which they participate. 

The two most important Sobol' indices are 

z£ = XX' and (!) 

vC.u 

r\= E ( 2 ) 

These satisfy ^ ^ f 2 u ^ a 2 and = a 2 — t 2 _ u , where a 2 is the variance 
J (f(x) — fi) 2 dx. We use — u or u c depending on typographical readability, to 
denote the complement of u in T>. These indices provide two measures of the 
importance of the variables in subset u. The larger measure includes interac- 
tions between variables in u and variables in its complement, while the smaller 
measure excludes those interactions. 

If we unite the part of x € [0, l) d corresponding to indices in the set u with 
the part of another point y € [0, l] d for indices in —u, then the resulting point 
is denoted x u :y_ u . Most estimation strategies for Sobol' indices are based on 
the identities 

tI = [i 2 + J f(x)f(x u :y_ u )dxdy, and (3) 
rl = ~ J (/(*) - f{x u :y_ u )f dx dy, (4) 
with integrals taken over x and y G [0, l] d . 
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When t 2 is small, then (QJ leads to a very effective Monte Carlo strategy 
based on 



2 1 

7 « = otX)^^*) ~ /(^^-J) 2 



2n 

8=1 



for {xi,Zi) ~ U[0, This estimator is a sum of squares, hence nonnegative, 
and it is unbiased. If the true r 2 = 0, then r 2 — with probability one. 
More generally, if the true r 2 is small, then the estimator averages squares of 
typically small quantities. We assume throughout that J f(x) 4 dx < oo so that 
the variance of this and our other estimators is finite. 
The natural way to estimate r 2 is via 

n 

i=l 



The simplest estimator of p, is (1/n) X^ILi f{ x i) but Ijanon et al.l ( 2012 ) have 



recently proved that it is better to use fi = (l/2n) X)"=i(/( a; i) + f( x i,u'yi _«))• 



3 The estimators 

The problem with ([5]) is that it has very large variance when r 2 <C /z 2 . Although 
r 2 is invariant with respect to shifts replacing f(x) by f(x) — c for a constant c , 
the variance of ([5]) can be strongly affected by such shifts. ISobol'l (|l990l Il993l) 



recommends shifting / by an amount close to /i, which while not necessarily 
optimal, should be reasonable. 

An approximation to /i can be obtained by Monte Carlo or quasi-Montc 
Carlo simulation prior to estimation of r 2 . In our simulations we suppose that 
an oracle has supplied fi and then we compare estimators that do and do not 
benefit from the oracle. 

Another estimato r of t 2 was con sidered independently bv lSaltelli (2002) and 



the Masters thesis of iMauntd (|2002h under the supervision of S . S. Kucherenko 



and C . Pantelides. This estimator, called correlated sampling bv lSobol' and Mvshetskayal 



(|2007l ) replaces f(x i>u :y i u ) by f(x ijU :y i u ) - f(y) in © and then it is no longer 
necessary to subtract jj, 2 . Indeed the method can be viewed as subtracting the 
estimate n~ 2 XT=i 2™=i f( x i)f(yi') from the sample mean of f(x)f(x u :y_ u ). 
That estimator is called "Correlation 1" below. 



Sobol' and Mvshetskaval ( 2007t l find that even the correlated sampling method 



has increased variance when fj, is large. They propose another estimator replac- 
ing the first f(xi) by /(a;,) — c for a constant c near /j,. Supposing that an oracle 
has supplied c = /iwe call the resulting method "Oracle 1" because it makes 
use of the true /i one time. One could also make use of the oracle's fi in both 
the left and right members of the cross moment pair. We call this estimator 
"Oracle 2" below. The fourth method to compare is a new estimator, called 
"Correlation 2", that uses two random offsets. Instead of replacing /(ccj) by 
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Name 


Expectation 


Cost 


Original ([5]) 


v 2 + zi 


2 


Correlation 1 


Ll 


3 


Correlation 2 


z 2 


4 


Oracle 1 


z 2 


3 


Oracle 2 


zl 


2 



Table 1: Estimators of r„ with expected value and number of function values 
required per sample. 

f(xi) — fi it draws a third variable z ~ U[0, l] d and is based on the identity 
(f(x) - f(z u :x- a ))(f(x a :y_ u ) - f(y)) dxdydz 



= (M 2 +z')-M 2 -M 2 + M 2 -zl (6) 
We compare the following estimators 
1 " 

- Y] f(^i)(f(^i,u-Vi,-u) - f(y)) (Correlation 1) 

i=l 
1 - 

- }if( x i) - f{ z i,u-x l -u){f{xi, u -y i _J - f(y)) (Correlation 2) 
1 - 

- Yiffri) - M)(/(a5i,«:Vi,-u) ~ f(v)) (° racle 1) 
n f— ' 

i— i 

1 " 

- - ^){f{x l ,u-y l .- u ) - M) (Oracle 2) 



Z— 1 



where (xi,y i7 Zi) ~ U[0, l] 3d for i = l,...,n. Not all components of these 
vectors are necessary to estimate for a single u, but many applications seek 

for several sets u at once, so it is simpler to write them this way. Also, 
the cost is assumed to be largely in evaluating /, not in producing the inputs 
(xi,y i: zi). The properties of these estimators are given in Table [1] 

The intuitive reason why "Correlation 2" should be effective on small indices 
is as follows. If the variables in the set u are really unimportant then /(x) 
will be determined almost entirely by the values in x_ u . Then both f(xi) — 
f(zi tU :Xi t - u ) and f{xi tU :y i _ u ) — f(y) should be small values, even smaller than 
by centering at /Lt, and so the estimator takes a mean product of small quantities. 

We do not compare the original estimator ([S]) . The bias correction makes it 
more complicated to describe the accuracy of this method. Also th at estimator 



had extremely bad performance in lSobol' and Mvshetskaval (|20071 ). 
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Set u 


z?> 2 


Corr 1 


Corr 2 


Orel 1 


Orel 2 


{1} 


0.048 


1 


4256 


518 


74 


{2} 


0.190 


1 


1065 


525 


297 


{3} 


0.762 


1 


267 


556 


1329 


{1,2} 


0.238 


1 


774 


503 


364 


{1,3} 


0.809 


1 


243 


529 


1306 


{2,3} 


0.952 


1 


194 


473 


1261 



Table 2: Relative efficiencies of 4 estimators of for the g-function, rounded 
to the nearest integer. Relative indices r^/<r 2 rounded to three places. 

4 Examples 
4.1 g function 



This is the example used by ISobol' and Mvshetskaval (|2007h . It has d = 3 and 



/ ( *)=n |4 ^- 2|+2+3a . 

M i + a j 

This function has // = 27 and cr^ 1} = 0.0675, <r 2 2} = 0.27, a 2 {3} = 1.08, (7? 12} = 
0.000025, a% 3} = 0.0001, £t| 23} = 0.0004, cr| 23} = 0.0004, and cr^ 2 3} = 
3.7 x 10~ 8 . The smallest and therefore probably the most difficult to estimate 
is Z{i} = a {i}- That is the one that they measure. 

They report numerical values of T{iy/"l3i\ for the four estimates in Table Q] 
(exclusive of the new "Correlation 2" estimate) based on n — 256,000 samples. 
The original estimator gave a values 2.239 times as large as the true Z?u- The 
others were ranged from 0.975 to 1.104 times the true value. They did not use 
the oracle for /i, but centered their estimator instead on c = 26.8 to investigate 
a somewhat imperfect oracle. 

The four estimators we consider here are all simply sample averages. As a 
result we can measure their efficiency by just estimating their variances. The 
efficiencies of these methods, using "Correlation 1" as the baseline are given by 

3 Var(corr 1) Var(corr 1) 3Var(corrl) 

-E-corr 2 = T 77 ; , ^orcl 1 = TT 7 Trr" , an( l ^ord 2 = TT 77 7 7777 

4 Var(corr 2) Var(orcl 1) 2 Var(orcl 2) 

where the multiplicative factors accounts for the unequal numbers of function 
calls required by the methods. 

The efficiencies of the four estimators are compared in Table [2] based on 
n = 1,000,000 function evaluations. This is far more than one would ordinarily 
use to estimate the indices themselves, but we are interested in their sampling 
variances here. We consider all sets except u = {1, 2, 3} because rfj 2 3 j = a 2 
which can be estimated more directly. The table contains one small index t 2 ^, 
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Set u 


T 2 AX 2 
— ul 


Corr 1 


Corr 2 


Orel 1 


Orel 2 


{1} 


0.165 


1 


0.74 


1.13 


1.23 


{2} 


0.165 


1 


0.73 


1.14 


1.24 


{3} 


0.041 


1 


1.69 


1.15 


0.54 


{4} 


0.041 


1 


1.67 


1.15 


0.54 


{5} 


0.010 


1 


5.45 


1.16 


0.20 


{6} 


0.010 


1 


5.58 


1.16 


0.20 


{1,2} 


0.826 


1 


0.75 


1.21 


1.86 


{3,4} 


0.176 


1 


1.23 


1.16 


0.94 


{5,6} 


0.042 


1 


2.94 


1.16 


0.38 



Table 3: Relative efficiencies of 4 estimators of r 2 for the product function ([7]). 
Relative indices t^/ct 2 rounded to three places. 



(the one lSobol' and Mvshetskava (|2007h studied). O n the small effect, the new 
Correlation 2 estimator is by far the most efficient, outperforming both oracles. 
Inspecting the table, it is clear that it pays to use subtraction in both left and 
right sides of the estimator and that the smaller the effect t 2 is, the better it is 
to replace the oracle's /z with a correlation based estimate. 



4.2 Other product functions 

It is convenient to work with functions of the form 

d 

/(*) = n>j w 

where J Q g{x) dx = 0, J Q g(x) 2 dx — 1, and J Q g(x) 4 dx < oo. For this function 
< = EU„ rf i\^u 4 Taking g(x) = ^2{x - 1/2), d = 6, M = (1, 1, 1, 1, 1, 1) 
and r = (4, 4, 2, 2, 1, l)/4 and sampling n — 1,000,000 times lead to the results 
in TableEl The results are not as dramatic as for the ^-function, but they show 
the same trends. The smaller t 2 is, the more improvement comes from the new 
estimator. On the smallest indices it beats both oracles. 

The improvements for the ^-function are much larger than for the product 
studied here. For the purposes of Monte Carlo sampling the absolute value cusp 
in the g-function makes no difference. The (/-function has the same moments 
as the product function with fij = 3 and Tj = l/(v / 3aj). Computing the g 
function estimates with the product function code (as a check) yields the same 
magnitude of improvement seen in Table [2] 



5 Some generalizations and a recommendation 

The best unbiased estimator of r 2 is the one that minimizes the variance after 
making an adjustment for the number of function calls. Unfortunately variances 
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of estimated variances involve fourth moments which are harder to ascertain 
than the second moments underlying the ANOVA decomposition. 

5.1 More general centering 

The estimators in Section |4] are all formed by taking pairs f(x) and f(x u :y_ u ). 
subtracting centers from them, and averaging the product of those two centered 
values. Where they differ is in how they are centered. 

We can generalize this approach to a spectrum of centering methods. 

Theorem 1. Let v and v' be two subsets of u c and let x, y, w, z be independent 
U[0, 1] random vectors. Then 

E((f( X )-f(x v :z- v ))(f(x u :y_ u )-f(y v ,:w- v ,))) = t\. (8) 

Proof. 

e((/(sb) - /(*„:*-,)) {f{x u :y- u ) - f(y v r.W- v ,))) 
= (P 2 + zl) - (M 2 + r%) - (/i 2 + r 2 unv ) + (p 2 + zl) 

= T 2 

because uC\v — and t% — 0. □ 
As a result of Theorem [1] we may estimate t 2 by 

1 n 

- ^2(f( x i) - f( x i,v- z i-v))(f{xi,u:yi- u ) - f{y iy :Wi,- v ,)) (9) 

i— 1 

where (x i; y t , w h z t ) ~ U(0, l) 4d . 

The new estimate (J9j) uses up four independent vectors, not the three used in 
the Correlation 2 estimator, so we should check that it really is a generalization. 

First, suppose that v' = u c . Then the only part of the vector w that is used 
in ((9} is W- V i = w u . Because ((9J does not use y u the needed parts of y and w 
fit within the same vector. That is we can sample y as before and use y u for 
w u . As a result when v' = u c we only need three vectors as follows: 

1 " 

- X) - f(x*y-^-v)) U(*iy-Vi,-u) - /(»))• (io) 

i=l 

If we take v = u c too, then (fTUj) reduces to the Correlation 2 estimator. 

At first, it might appear that the Oracle 2 estimator arises by taking v = 
v' = 0, but this is not what happens, even when /j, = 0. A more appropriate 
generalization of the oracle estimators is to based on the identity 

1.1 = E ((/( a; ) - Hv{x v ))(f{x u :z^ u ) - /v(Zu'))) 
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where fJ. v (x v ) — E(f(x) \ x v ) and v, v' C u c . To turn this identity into a 
practical estimator requires estimation of these conditional expectations. For 
v — v' — the conditional expectations become the unconditional expectation, 
which is simply the integral of /. For other v and v', such estimation requires 
something like nonparametric regression, with bias and variance expressions 
that complicate the analysis of the resulting estimate. 

5.2 Recommendation 

The Correlation 2 estimator has v = v' = u c , so it holds constant all of the 
variables in a;_ u . From Theorem [1] we see that this is just one choice among 
many and it raises the question of which variables should be held fixed in a 
Monte Carlo estimate of . The result is that we find taking v = v' = u c to be 
a principled choice. 

We can get some insight by considering functions of product form. Even 
there the resulting variance formulas become cumbersome, but simplified ver- 
sions yield some insight. We can write it as 

d 

/(*) = IIfcifo) t 11 ) 

i=i 

where hj(x) = fij +Tjgj(x) with f Q gj (x) p dx taking values 0, 1, jj and Kj for 
p = 1, 2, 3, and 4 respectively. In statistical terms, the random variable hj(x) 
has skewness 7j/r? and kurtosis Kj/rj — 3 if x ~ U[0, 1] and tj > 0. We will 
suppose that all tj and that all Kj < oo. 

Proposition 1. Letr^ be given by (0), where f is given by the product model 
Then, for v, v' C u c , 

nVa,r(zl) = E(Q v (x, y, z, w)Q uv , (x, y, z, w)) - r * 

for x, y, z, w ~ U[0, l] d where 
d 

Qv 1 1 ^fe) + II h]( Xj ) \\ h)(z 2 ) - 2 \\ K)(xj) J] hiixrfhjizj), and 

j=l j£v jgv j£v j^v 

Quv = n n ^ 2 (%) + n ^ 2 (%) n ^(wj) 

-2 Yl h2 ](Vj) II h A x i) h ii w o) II h j(yj) h i( w i) 

j£u c nv' j(EuC]v' c j£u c C\v' c 

Proof. We need the expected square of the quantity inside the expectation in 
equation ©. First we expand 

d 

f{x) - f(x v :z_ v ) = J[ hj(xj) ~ Y[ h i( x j) II h j( z i)- 
j=i jev jgv 
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Squaring this term yields Q v , and similarly, squaring 

f{xu-y- u ) - f{vv'- w -v) = n h o( x j) n h i(vj) - n n 

j£u jgu j€v' j^v' 

yields Q U v', after using udv' = 0. □ 

Using Proposition [1] we can see what makes for a good estimator in the 
product function setting. The quantities Q v and Q U v' should both have small 
variance and their correlation should be small. The latter effect is very compli- 
cated depending on the interplay among u, v and v', and one might expect it 
to be of lesser importance. So we look at E(Q^) for insight as to which indices 
should be in v. Then we suppose that it will usually be best to take the same 
indices for both v and v'. 

Theorem 2. Letr^ be given by where f is given by the product model (jlip 
and let Q v be as defined in Proposition {T]) . Then Q v is minimized over v C u c 
by taking v = u c . 

Proof. Let fi^j = hj(x) 4 dx and /i 2 j = Jq hj(x) 2 dx. It is elementary that 
fi4j ^ /if,. Expanding E(Q^) and gathering terms yields, 

d d 

n w + n ^ + 4 n ^ n ^% + 2 n ^ n ^ 

j=i j=i jev j^v jev jgv 

4 n ^ n $3 - 4 n ^ n ^ 

jev j^v jev j^v 

d 

= 2 JJ fj, i:j - 2 Yl M4j ]J • 

We minimize this expression by taking the largest possible set v C u c , that is 
v = u c . □ 

From Theorem [21 we see that the Correlation 2 estimator minimizes E(Q^) 
for product functions among estimators of the form ([9]) . 
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